Detecting synchronization clusters in multivariate time series via coarse- graining of Markov chains 
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Synchronization cluster analysis is an approach to the detection of underlying structures in data sets of mul- 
tivariate time series, starting from a matrix R of bivariate synchronization indices. A previous method utilized 
the eigenvectors of R for cluster identification, analogous to several recent attempts at group identification using 
eigenvectors of the correlation matrix. All of these approaches assumed a one-to-one correspondence of domi- 
nant eigenvectors and clusters, which has however been shown to be wrong in important cases. We clarify the 
usefulness of eigenvalue decomposition for synchronization cluster analysis by translating the problem into the 
language of stochastic processes, and derive an enhanced clustering method harnessing recent insights from the 
coarse-graining of finite-state Markov processes. We illustrate the operation of our method using a simulated 
system of coupled Lorenz oscillators, and we demonstrate its superior performance over the previous approach. 
Finally we investigate the question of robustness of the algorithm against small sample size, which is important 
with regard to field applications. 
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I. INTRODUCTION 

Studying the dynamics of complex systems is relevant in 
many scientific fields, from meteorology JTh over geophysics 
|0] to economics [3] and neuroscience J3>|5t|. In many cases, 
this complex dynamics is to be conceived as arising through 
the interaction of subsystems, and it can be observed in the 
form of multivariate time series where measurements in dif- 
ferent channels are taken from the different parts of the sys- 
tem. The degree of interaction of two subsystems can then 
be quantified using bivariate measures of signal interdepen- 
dence 0, [Ml ■ A wide variety of such measures has been 
proposed, from the classic linear correlation coefficient over 
frequency-domain variants like magnitude squared coherence 
|0] to general entropy-based measures lioll . A more specific 
model of complex dynamics that has found a large number of 
applications is that of a set of self-sustained oscillators whose 
coupling leads to a synchronization of their rhythms |[l~ll [l2tl . 
Especially the discovery of the phenomenon of phase synchro- 
nization ill 311 led to the widespread use of synchronization in- 
dices in time series analysis IU4l[T5l[T^1 . 

However, by applying bivariate measures to multivariate 
data sets an A^-dimensional time series is described by an 
N x iV-matrix of bivariate indices, which leads to a large 
amount of mostly redundant information. Especially if ad- 
ditional parameters come into play (nonstationarity of the dy- 
namics, external control parameters, experimental conditions) 
the quantity of data can be overwhelming. Then it becomes 
necessary to reduce the complexity of the data set in such a 
way as to reveal the relevant underlying structures, that is, to 
use genuinely multivariate analysis methods that are able to 
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detect patterns of multichannel interaction. 

One way to do so is to trace the observed pairwise cor- 
respondences back to a smaller set of direct interactions us- 
ing e.g. partial coherence |0, [nil , an approach that has re- 
cently been extended to phase synchronization lfl8ll . Another 
and complementary way to achieve such a reduction is cluster 
analysis, that is, a separation of the parts of the system into dif- 
ferent groups, such that signal interdependencies within each 
group tend to be stronger than in between groups. This de- 
scription of the multivariate structure in the form of clusters 
can eventually be enriched by the specification of a degree of 
participation of an element in its cluster. The straightforward 
way to obtain clusters by app lying a threshold to the matrix 
entries has often been used 01 Ufa 1231, but it is very suscepti- 
ble to random variation of the indices. As an alternative, sev- 
eral attempts have recently been made to identify clusters us- 
ing eigenvectors of the correlation matrix Il9ll2lll22ll . which 
were motivated by the application of random matrix theory to 
empirical correlation matrices JHHH]. 

In the context of phase synchronization analysis, a first ap- 
proach to cluster analysis was based on the derivation of a 
specific model of the internal structure of a synchronization 
cluster ll24L l25ll . The resulting method made the simplifying 
assumption of the presence of only one cluster in the given 
data set, and focused on quantifying the degree of involvement 
of single oscillators in the global dynamics. Going beyond 
that, the participation index method 12611 defined a measure 
of oscillator involvement based on the eigenvalues and eigen- 
vectors of the matrix of bivariate synchronization indices, and 
attributed oscillators to synchronization clusters based on this 
measure. 

But despite the apparent usefulness of eigenvalue decom- 
position for the purposes of group identification, beyond some 
phenomenological evidence no good reason has been put for- 
ward why an eigenvector of a synchronization matrix should 
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directly indicate the system elements taking part in a cluster. 
Moreover, in a recent survey of the performance of synchro- 
nization cluster analysis in simulation and field data 12711 it has 
been shown that there are important special cases — clusters of 
similar strength that are slightly synchronized to each other — 
where the assumed one-to-one correspondence of eigenvec- 
tors and clusters is completely lost. 

In this paper, we provide a better understanding of the role 
of eigenvectors in synchronization cluster analysis, and we 
present an improved method for detecting synchronization 
clusters, the eigenvector space approach. The organization 
of the paper is as follows: In Section II, we briefly recall the 
definition of the matrix of bivariate synchronization indices R 
as the starting point of the analysis. We motivate its transfor- 
mation into a stochastic matrix P describing a Markov chain 
and detail the properties of that process. Utilizing recent re- 
sults on the coarse-graining of finite-state Markov processes 
lEH I29L [30h we derive our method of synchronization clus- 
ter analysis, and we illustrate its operation using a system of 
coupled Lorenz oscillators. In Section EH, we compare the 
performance of the eigenvector space method with that of the 
previous approach, the participation index method lElill . and 
we investigate its behavior in the case of small sample size, 
which is important with regard to the application to empirical 
data. 



II. METHOD 

A. Measuring synchronization 

Synchronization is a generally occurring phenomenon in 
the natural sciences, which is defined as the dynamical ad- 
justment of the rhythms of different oscillators II ill . Because 
an oscillatory dynamics is described by a phase variable <f>, a 
measure of synchronization strength is based on the instanta- 
neous phases 4>i m of oscillators i = 1 . . . N, where the index 
in enumerates the values in a sample of size n. The nowa- 
days commonly used bivariate index of phase synchronization 
strength 11151 llq, Il8l |2(X |24 l26l l27ll results from the applica- 
tion of the first empirical moment of a circular random vari- 
able 113 ill to the distribution of the phase difference of the two 
oscillators: 

Rij = iW(^) . (i) 

m—l 

The measure takes on values from the interval [0,1], repre- 
senting the continuum from no to perfect synchronization of 
oscillators i and j; the matrix R is symmetric, its diagonal be- 
ing composed of Is. Special care has to be taken in applying 
this definition to empirical data, because the interpretation of 
R as a synchronization measure in the strict sense only holds 
if phase values were obtained from different self-sustained os- 
cillators. 

The determination of the phase values <f>i m generally de- 
pends on the kind of system or data to be investigated. For the 



analysis of scalar real-valued time series s, (t) that are charac- 
terized by a pronounced dominant frequency, the standard ap- 
proach utilizes the associated complex-valued analytic signal 
Zi(t) B32I1 . within which every harmonic component of Sj(i) is 
extended to a complex harmonic. The analytic signal is com- 
monly defined [13] as 

z i (t) = s i (t)+ills i (t), (2) 

where Hs^ denotes the Hilbert transform of the signal Sj, 

H Si (*) = ip.v/°° (3) 

and where P. V. denotes the Cauchy principal value of the inte- 
gral. The instantaneous phase of the time series is then defined 

as 

4>i{t) = argzi(i). (4) 

Equivalently, the analytic signal can be obtained using a filter 
that removes negative frequency components, 

z l (t)=T- 1 (T[s l (t)} [1 + sgn(w)]) , (5) 

where T denotes the Fourier transform into the domain of 
frequencies uj and sgn denotes the sign function 113 3fl . This 
definition is more useful in practice because it can be straight- 
forwardly applied to empirical time series, which are sampled 
at a finite number n of discrete time points t m , Si m — Si(t m ). 
If several time series (realizations of the same process) are 
available, the obtained phase values can be combined into a 
single multivariate sample of phases (f>i m , where the index 
m = 1 . . . n now enumerates the complete available phase 
data. 



B. Cluster analysis via Markov coarse graining 

In the participation index method [26], the use of eigenvec- 
tors of R for synchronization cluster analysis was motivated 
by the investigation of the spectral properties of correlation 
matrices in random matrix theory ll23ll . Another context where 
eigenvalue decomposition turns up naturally is the computa- 
tion of matrix powers, which becomes as simple as possible 
using the spectral representation of the matrix. 

Powers of R have a well defined meaning in the special case 
of a binary- valued matrix 6 {0, 1}), as it is obtained for 
instance by thresholding: The matrix entries of R a count the 
number of possible paths from one element to another within 
a steps, i.e., they specify the degree to which these elements 
are connected via indirect links of synchrony. By analogy, 
we interpret (R a )ij also in the general case as quantifying 
the degree of common entanglement of two elements i and 
j within the same web of synchronization relations. In the 
following we will call this quantity the a-step synchronization 
strength of two oscillators because it reduces to the original 
bivariate synchronization strength R^j in the case a = 1. 

This synchronization strength over a steps is relevant for 
synchronization cluster analysis, because in a system where 
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synchronization clusters are present it is possible that the de- 
gree of direct bivariate synchrony of two elements is not very 
strong, but they are both entangled into the same web of links 
of synchrony. These indirect links, which make the two ele- 
ments members of the same synchronization cluster, become 
visible in R a . 

Moreover, with increasing power a the patterns of syn- 
chrony within a cluster (the matrix columns) become more 
similar, approaching the form of one of the dominant eigen- 
vectors. If there are different clusters in the system, a suitable 
a can be chosen such that each cluster exhibits a different pat- 
tern, representing the web of synchronization relations it is 
comprised of. These patterns constitute signatures by which 
elements can be attributed to clusters. The cluster signatures 
are related to the dominant eigenvectors of R; by transition 
to larger a they become even more dominant, leading to an 
effective simplification of the matrix. 

For the identification of the members of a cluster only the 
patterns of synchrony are relevant, while the absolute size of 
elements of different columns diverges with a, so that some 
sort of normalization is called for. Different normalization 
schemes might be used for this purpose. However, using the 
L 1 -norm the procedure can be simplified, because for the nor- 
malized version of the synchronization matrix, given by 



Pi, 



Ri 



(6) 



it holds that powers of P are automatically normalized, too. 
Moreover, the L 1 -normalized matrix P is a column-stochastic 
matrix, that is, it can be interpreted as the matrix of i *— j tran- 
sition probabilities describing a Markov chain, whose states 
correspond to the elements of the original system. Via this 
connection, the tools of stochastic theory and especially recent 
work on the coarse-graining of finite-state Markov processes 
l28l [29l l30ll can be utilized for the purposes of synchroniza- 
tion cluster analysis. 

The Markov process defined in this way possesses some 
specific properties 13411 : It is aperiodic because of the nonzero 
diagonal entries of the matrix, and it is in general irreducible 
because the values of empirical Rij for i ^ j will also almost 
never be exactly zero. For a finite-state process, these two 
properties amount to ergodicity, which implies that any dis- 
tribution over states converges to a unique invariant distribu- 
tion p™, corresponding to the eigenvector of P for the unique 
largest eigenvalue 1. This distribution can be computed from 
R as 
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where the vector components of p^ are denoted by p] . With 
the matrix R also the stationary flow given by 



(o) 



Vji 



(8) 



is symmetric, i.e., the process fulfills the condition of detailed 
balance P{j p^ = Pji pf \ which makes eigenvalues and 
eigenvectors of P real-valued 12811 . 



For the Markov process, the a-step synchronization 
strength considered above translates into transitions between 
states over a period of r time steps. To compute the corre- 
sponding transition matrix P T the eigenvalue decomposition 
of P is used. If A& with k = ... N — 1 denote the eigen- 
values of P, and the right and left eigenvectors p% and Ak are 
scaled such that the orthonormality relation 



Ak Pi 



n-i 



is fulfilled, the spectral representation of P is given by 
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and consequently 



pT = £ A fc P kAk - 
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We assume that eigenvalues are sorted such that Ao = 1 > 
Ai| > I A2 1 > ■•• > I Ajv— 1 1 - The scaling ambiguity left by 
the orthonormality relation is resolved by choosing 



Pik = p- 0) Mi 



(12) 



(where pik and Aki denote the vector components of pk and 
A)., respectively), which leads to the normalization equations 
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with the special solutions p^ 
ally, a generalized orthonormality relation 
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follows for the right eigenvectors. 

The convergence of every initial distribution to the station- 
ary distribution p'°) corresponds to the fact that because of 
non-vanishing synchronies the whole system ultimately forms 
one single cluster. This perspective belongs to a timescale 
t — ► 00, at which all eigenvalues A£ go to except for the 
largest one, AJ = 1. In the other extreme of a timescale r = 0, 
P T becomes the identity matrix, all of its columns are differ- 
ent, and the system disintegrates into as many clusters as there 
are elements. For the purposes of cluster analysis, intermedi- 
ate timescales are of interest on which many but not all of the 
eigenvalues are practically zero. If we want to identify q clus- 
ters, we expect to find that many different cluster signatures, 
and that means we have to consider P T at a time scale where 
eigenvalues A£ may be significantly different from zero only 
for the range k = . . . q — 1. 



This is achieved by determining t such that \X q 



0. Us- 



ing a parameter £ <C 1 chosen to represent the quantity that 
is considered to be practically zero (e.g. ( = 0.01), from 
W\ T — C we calculate the appropriate timescale for a cluster- 
ing into q clusters as 
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The vanishing of the smaller eigenvalues at a given timescale 
describes the loss of internal differentiation of the clusters, the 
removal of the structural features encoded in the correspond- 
ing weaker eigenvectors. On the other hand, the differentia- 
tion of clusters from each other via the dominant eigenvec- 
tors will be the clearer the larger the remaining eigenvalues 
are, especially X r q _ 1 - This provides a criterion for selecting 
the number of clusters q: the clustering will be the better the 
larger \X q -i\ T ^ is. Equivalently, we select q based on the 
timescale separation factor 



r(q - 1) 
r(q) 



log I \ | 
log|A,_i|' 
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which is independent of the particular choice of f, and invari- 
ant under rescaling of the time axis. This criterion gives a 
ranking of the different possible choices (from 1 to N — 1). 
The fact that Ao = 1 and therefore F(l) — oo implies a 
limitation of this approach, since the choice q = 1 is al- 
ways characterized as absolutely optimal. Therefore the first 
meaningful — and usually best — choice is the second entry in 
the q ranking list. 

To determine which elements belong to the same cluster, 
we need a measure d of the dissimilarity of cluster signatures, 
that is, of the column vectors of P T . Since these vectors be- 
long to the space of right eigenvectors of P, the appropriate 
dissimilarity metric is based on the norm corresponding to the 
normalization equation for right eigenvectors [Eq. ([T3l left]: 
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The resulting column vector dissimilarity 
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has the convenient property that the dimensionality of the 
space within which the clustering has to be performed can 
be reduced, because the expression obtained by inserting the 
spectral representation for the matrix entries of P T , 



simplifies to 
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using the generalized orthonormality of right eigenvectors, 
Eq. ( fT4b . Since for appropriately chosen t = r(q) contri- 
butions for larger k vanish starting from k = q, and because 
A oi = 1 for all i, it is sufficient to let the sum run over the 
range 1 ... q — 1. The dissimilarity d can therefore be inter- 
preted as the Euclidean distance within a (q — 1) -dimensional 
left eigenvector space, where each element j is associated 
with a position vector 



To actually perform the clustering, we can in principle 
use any algorithm that is designed to minimize the sum of 
within-cluster variances. Our implementation derives from 
the observation that clusters in eigenvector space form a q- 
simplex lf29[ l30ll . A first rough estimate of the cluster loca- 
tions can therefore be obtained by searching for the extreme 
points of the data cloud, employing a subalgorithm described 
in Ref. [30]: Determine the point farthest from the center of 
the cloud, then the point farthest from the first one; then it- 
eratively the point farthest from the hyperplane spanned by 
all the previously identified points, until q points are found. 
Using the result of this procedure as initialization, the stan- 
dard k-means algorithm 113511 that normally tends to get stuck 
in local minima converges in almost all cases quickly onto the 
correct solution. 

In summary, the algorithmic steps ll36Tl of the eigenvec- 
tor space method introduced in this paper are: (1) Calculate 
the matrix of bivariate synchronization indices R^j, Eq. ([T). 
(2) Convert the synchronization matrix R into a transition 
matrix P, Eq. (|6}. (3) Compute the eigenvalues A& and left 
eigenvectors Af. of P. (4) Select the number of clusters q, 
q > 1, with the largest timescale separation factor F(q), 
Eq. ( [ToT l. (5) Determine the positions o(j), Eq. ( 12 U . in eigen- 
vector space for r = r(q), Eq. <fT3T >. (6) Search for q extreme 
points of the data cloud. (7) Use these as initialization for 
k-means clustering. 



C. Illustration of the eigenvector space method 

In order to illustrate the operation of the method, we ap- 
ply it to multivariate time series data obtained from simulated 
nonlinear oscillators, coupled in such a way as to be able to 
observe synchronization clusters of different size as well as 
unsynchronized elements. The system consists of N = 9 
Lorenz oscillators that are coupled diffusively via their z- 
components: 



x,j = 10':;/, 

i)j = 28 Xj - 

= — 3 z j 



Xj), 



(22) 



5(j) = (\X k \ T A kj ) 



1 . 
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The coupling coefficients were chosen from {0, 1} to im- 
plement the coupling configuration depicted in Fig.[T](a), such 
that oscillators #2^1 are unidirectionally driven by #1, oscil- 
lators #7 and 8 driven by #9, and #5 and 6 are uncoupled. 
These differential equations were numerically integrated us- 
ing a step size of At = 0.01, starting from randomly cho- 
sen initial conditions. After discarding an initial transient of 
10 4 data points, further 4 x 10 4 samples entered data process- 
ing. Instantaneous phases <fij m of oscillators j at time points 
t m = (m — 1) At were determined from the z-components 
using the analytic signal approach after removal of the tem- 
poral mean, and bivariate synchronization strengths Rij were 
computed. 

The outcomes of the eigenvector space method applied to 
the resultant matrix of bivariate indices are presented in Fig.Q] 
Figure[T](b) shows the spectrum of eigenvalues A^ of the tran- 
sition matrix P with the corresponding timescales r(q) and 
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FIG. 1 : Application of the eigenvector space method to a system of 
nine partially coupled Lorenz oscillators, (a) Coupling configuration: 
The left group of four oscillators is driven by #1, the right group of 
three driven by #9, the remaining two are uncoupled, (b) Eigenval- 
ues Afc, timescales r(q), and timescale separation factors F(q). The 
maximal separation factor F(4) indicates the presence of four clus- 
ters, (c) Positions attributed to oscillators in 3-dimensional eigen- 
vector space (01,02,03). The clustering by the k-means algorithm 
results in a cluster composed of oscillators #1-^1 (□), two single- 
element clusters consisting in oscillators #5 (o) and #6 (*), respec- 
tively, and a cluster composed of oscillators #7-9 (o). 



timescale separation factors F(q). A gap in the eigenvalue 
spectrum between indices k — 3 and 4 translates into a max- 
imum timescale separation factor for q = 4, which recom- 
mends a search for four clusters in the eigenvector space for 
timescale r = 3.5. This 3-dimensional space is depicted in 
Fig.QJc), where the expected grouping into four clusters can 
be clearly recognized in the arrangement of elements j with 
positions o(j). These four clusters that correspond to the two 
groups of driven oscillators and the two uncoupled oscillators 
(each of which forms a single-element cluster) are easily iden- 
tified by the k-means algorithm. The results shown here were 
obtained using £ = 0.01; alternative choices of 0.1 and 0.001 
yielded the same clustering. 



III. PERFORMANCE 

To assess the performance of the eigenvector space method 
introduced in this paper, we compare it with the previous ap- 
proach to synchronization cluster analysis based on spectral 
decomposition. For reference, we briefly recollect the impor- 
tant details. 

The participation index method l26ll is based on the eigen- 
value decomposition of the symmetric synchronization matrix 
R itself, into eigenvalues rjf. and L 2 -normalized eigenvectors 
tj/j. Each of the eigenvectors that belong to an eigenvalue 
rjk > 1 is identified with a cluster, and a system element 
j is attributed to that cluster k in which it participates most 
strongly, as determined via the participation index 



n 



(23) 



where Vjk are the eigenvector components of Vk. The method 
performs quite well in many configurations, but it encounters 
problems when confronted with clusters of similar strength 
that are slightly synchronized to each other, which was 
demonstrated in Ref. [27] using a simulation. 

Here we employ a refined version of that simulation to com- 
pare the two methods. We consider a system of N = 32 os- 
cillators forming two clusters, and check whether the methods 
are able to detect this structure from the bivariate synchroniza- 
tion matrix R for different degrees of inter-cluster synchrony 
Pint. The cluster sizes are controlled via a parameter r, such 
that the first cluster comprises elements j = 1 ... r, the sec- 
ond (r + 1) . . . N. 

To be able to time-efficiently perform a large number of 
simulation runs and to have precise control over the structure 
of the generated synchronization matrices, we do not imple- 
ment the system via a set of differential equations. Instead, 
our model is parametrized in terms of the population value of 
the bivariate synchronization index, Eq. (Q}, 



Pij = l(exp[ 



(24) 



(where (•} denotes the expectation value), which is the first 
theoretical moment of the circular phase difference distribu- 
tion JUl]. For i,j within the same cluster py is fixed at a 
value of pi = P2 = 0.8. For inter-cluster synchronization 
relations it is set to pi nt , which is varied from up to 0.8 such 
that the two-cluster structure almost vanishes. 
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FIG. 2: Comparative performance of the participation index (a) and 
the eigenvector space method (b). The methods are tested on a sys- 
tem of N = 32 elements, divided into two clusters containing r 
and N — r elements, respectively. The inter-cluster synchronization 
strength p^t is varied from up to the value of intra-cluster syn- 
chronization 0.8. Synchronization matrices are generated based on 
samples of size n = 200. The quantity shown is the relative fre- 
quency (over 100 trials) with which the respective algorithm failed 
to recover exactly the given two-cluster structure; it is visualized in 
gray scale, covering the range from (white) to 1 (black). Compari- 
son shows that in a large area along r = 16 where the participation 
index method fails, the eigenvector space method introduced in this 
paper performs perfectly. 



To be able to properly account for the effect of random vari- 
ations of Rij around py due to finite sample size n we gener- 
ated samples of phase values, using an extension of the single- 
cluster model introduced in Ref . B24I1 : The common behavior 
of oscillators within each of the two clusters is described by 
cluster phases <f>i and <f>2. The phase differences between the 
members of each cluster and the respective cluster phase, 



<J>i for j = 1 . . . r, 
$2 forj = (r+l) 



. N, 



(25) 



as well as the phase difference of the two cluster phases, 

A$ = $ 2 -$i, (26) 

are assumed to be mutually independent random variables, 
distributed according to wrapped normal distributions 13111 
with circular moments pic, P2C, an d Pec, respectively. Since 
the summation of independent circular random variables re- 
sults in the multiplication of their first moments 12411 . for the 
relation of model parameters and distribution moments holds 
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For the performance comparison of the two methods, n = 200 
realizations of this model of the multivariate distribution of 
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FIG. 3: Performance of the eigenvector space method depending on 
the sample size n, investigated using the two-cluster system of Fig.|2] 
for different values of the inter-cluster synchronization strength pj nt . 
The quantity shown here is the proportion of values of the parameter 
r (controlling cluster sizes) at which the two-cluster structure failed 
to be recovered, visualized on a gray scale from (white) to 1 (black). 
The plot shows that the ability of the eigenvector space method to 
correctly identify clusters up to high values of p mt breaks down only 
for very small sample sizes n. 



phases (j>j were generated for each setting of the parameters, 
and synchronization indices R,^ were calculated via Eq. ([T). 

The clustering results are presented in Fig.|2]for the partic- 
ipation index and the eigenvector space method (using £ = 
0.01). The quantity shown is the relative frequency (over 100 
instances of the matrix R) of the failure to identify correctly 
the two clusters built into the model. Figure |2]( a) shows that 
the participation index method fails systematically within a 
region located symmetrically around r = N/2 = 16 (clusters 
of equal size). The region becomes wider for increasing p mt 
but is already present for very small values of the inter-cluster 
synchronization strength. In contrast, the eigenvector space 
approach (b) is able to perfectly reconstruct the two clusters 
for all values of r up to very strong inter-cluster synchroniza- 
tion. It fails to correctly recover the structure underlying the 
simulation data only in that region where inter-cluster syn- 
chronization indices attain values comparable to those within 
clusters, i.e., only where there are no longer two different clus- 
ters actually present. These results demonstrate that the eigen- 
vector space method is a clear improvement over the previous 
approach. 

For real-world applications, a time series analysis method 
has to be able to work with a limited amount of data. In the 
case of synchronization cluster analysis, a small sample size 
attenuates the observed contrast between synchronization re- 
lations of different strength, making it harder to discern clus- 
ters. Using the two-cluster model described above, in a fur- 
ther simulation we investigated the effect of the sample size 
n on the performance of the eigenvector space method. Pa- 
rameters pi n t and r were varied as before, and synchroniza- 
tion matrices were generated for n — 10 . . . 200 (in steps of 
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10). For each value of p m and n, as a test quantity the pro- 
portion of r-values for which the algorithm did not correctly 
identify the two clusters was calculated. The result shown 
in Fig.[3]demonstrates that the performance of the eigenvector 
space method degrades only very slowly with decreasing sam- 
ple size. The method seems to be able to provide a meaningful 
clustering down to a data volume of about n = 30 independent 
samples, making it quite robust against small sample size. 

IV. CONCLUSION 

We introduced a method for the identification of clusters 
of synchronized oscillators from multivariate time series. By 
translating the matrix of bivariate synchronization indices R 
into a stochastic matrix P describing a finite-state Markov 
process, we were able to utilize recent work on the coarse- 
graining of Markov chains via the eigenvalue decomposition 
of P. Our method estimates the number of clusters present in 
the data based on the spectrum of eigenvalues, and it repre- 
sents the synchronization relations of oscillators by assigning 
to them positions in a low-dimensional space of eigenvectors, 



thereby facilitating the identification of synchronization clus- 
ters. We showed that our approach does not suffer from the 
systematic errors made by a previous approach to synchro- 
nization cluster analysis based on eigenvalue decomposition, 
the participation index method. Finally, we demonstrated that 
the eigenvector space method is able to correctly identify clus- 
ters even given only a small amount of data. This robustness 
against small sample size makes it a promising candidate for 
field applications, where data availability is often an issue. 
Concluding we want to remark that though the method was de- 
scribed and assessed in this paper within the context of phase 
synchronization analysis, our approach might also give use- 
ful results when applied to other bivariate measures of signal 
interdependence. 
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